William Loving (wfl9zy) James Sweat (jes9hd)
- Explore and visualize the broader Computer Science / Data Science industry fields.
- Discover interesting correlations between attributes of available jobs using multiple different Datasets.
- Learn how to develop meaningful visualizations to communicate the data we have to an uninformed audience.
- Here we will explore Data Scientist Jobs in an around the United States
- Our main goal will be to visualize information related to what jobs pay the most based on different factors, are there correlations or patterns? etc..
data <- read_csv("../data/data-science-jobs/ds_salaries.csv")
head(data)
## # A tibble: 6 × 12
## ...1 work_year experience_level employment_type job_title salary
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 0 2020 MI FT Data Scientist 70000
## 2 1 2020 SE FT Machine Learning Scie… 260000
## 3 2 2020 SE FT Big Data Engineer 85000
## 4 3 2020 MI FT Product Data Analyst 20000
## 5 4 2020 SE FT Machine Learning Engi… 150000
## 6 5 2020 EN FT Data Analyst 72000
## # ℹ 6 more variables: salary_currency <chr>, salary_in_usd <dbl>,
## # employee_residence <chr>, remote_ratio <dbl>, company_location <chr>,
## # company_size <chr>
str(data)
## spc_tbl_ [607 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ...1 : num [1:607] 0 1 2 3 4 5 6 7 8 9 ...
## $ work_year : num [1:607] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
## $ experience_level : chr [1:607] "MI" "SE" "SE" "MI" ...
## $ employment_type : chr [1:607] "FT" "FT" "FT" "FT" ...
## $ job_title : chr [1:607] "Data Scientist" "Machine Learning Scientist" "Big Data Engineer" "Product Data Analyst" ...
## $ salary : num [1:607] 70000 260000 85000 20000 150000 72000 190000 11000000 135000 125000 ...
## $ salary_currency : chr [1:607] "EUR" "USD" "GBP" "USD" ...
## $ salary_in_usd : num [1:607] 79833 260000 109024 20000 150000 ...
## $ employee_residence: chr [1:607] "DE" "JP" "GB" "HN" ...
## $ remote_ratio : num [1:607] 0 0 50 0 50 100 100 50 100 50 ...
## $ company_location : chr [1:607] "DE" "JP" "GB" "HN" ...
## $ company_size : chr [1:607] "L" "S" "M" "S" ...
## - attr(*, "spec")=
## .. cols(
## .. ...1 = col_double(),
## .. work_year = col_double(),
## .. experience_level = col_character(),
## .. employment_type = col_character(),
## .. job_title = col_character(),
## .. salary = col_double(),
## .. salary_currency = col_character(),
## .. salary_in_usd = col_double(),
## .. employee_residence = col_character(),
## .. remote_ratio = col_double(),
## .. company_location = col_character(),
## .. company_size = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
summary(data)
## ...1 work_year experience_level employment_type
## Min. : 0.0 Min. :2020 Length:607 Length:607
## 1st Qu.:151.5 1st Qu.:2021 Class :character Class :character
## Median :303.0 Median :2022 Mode :character Mode :character
## Mean :303.0 Mean :2021
## 3rd Qu.:454.5 3rd Qu.:2022
## Max. :606.0 Max. :2022
## job_title salary salary_currency salary_in_usd
## Length:607 Min. : 4000 Length:607 Min. : 2859
## Class :character 1st Qu.: 70000 Class :character 1st Qu.: 62726
## Mode :character Median : 115000 Mode :character Median :101570
## Mean : 324000 Mean :112298
## 3rd Qu.: 165000 3rd Qu.:150000
## Max. :30400000 Max. :600000
## employee_residence remote_ratio company_location company_size
## Length:607 Min. : 0.00 Length:607 Length:607
## Class :character 1st Qu.: 50.00 Class :character Class :character
## Mode :character Median :100.00 Mode :character Mode :character
## Mean : 70.92
## 3rd Qu.:100.00
## Max. :100.00
data_transformed <- data%>%
mutate(experience_level = ifelse(experience_level=="EN", "Entry-Level",
ifelse(experience_level=="MI", "Manager-Level",
ifelse(experience_level=="SE", "Senior-Level",
ifelse(experience_level=="EX", "Executive-Level", experience_level)))))
data_transformed <- data_transformed%>%
mutate(employment_type = ifelse(employment_type=="CT", "Contract-Work",
ifelse(employment_type=="FT", "Full-Time",
ifelse(employment_type=="PT", "Part-Time",
ifelse(employment_type=="FL", "FreeLance", employment_type)))))
data_transformed <- data_transformed%>%
mutate(company_size = ifelse(company_size=="L", "Large",
ifelse(company_size=="M", "Medium",
ifelse(company_size=="S", "Small", company_size))))
data_transformed <- data_transformed%>%
mutate(remote_ratio = ifelse(remote_ratio==0, "In-Person",
ifelse(remote_ratio==50, "Hybrid",
ifelse(remote_ratio==100, "Remote", remote_ratio))))
head(data_transformed)
## # A tibble: 6 × 12
## ...1 work_year experience_level employment_type job_title salary
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 0 2020 Manager-Level Full-Time Data Scientist 70000
## 2 1 2020 Senior-Level Full-Time Machine Learning Scie… 260000
## 3 2 2020 Senior-Level Full-Time Big Data Engineer 85000
## 4 3 2020 Manager-Level Full-Time Product Data Analyst 20000
## 5 4 2020 Senior-Level Full-Time Machine Learning Engi… 150000
## 6 5 2020 Entry-Level Full-Time Data Analyst 72000
## # ℹ 6 more variables: salary_currency <chr>, salary_in_usd <dbl>,
## # employee_residence <chr>, remote_ratio <chr>, company_location <chr>,
## # company_size <chr>
- With this plot we can clearly see that as your experience level rises, you can expect to see a corresponding increase in salary.
- It is also worth noting that different types of work see different effects, for example, contract work is much more volatile than Full Time salaries.
- Note that In-Person only paid the highest for Medium Sized Companies, Remote actually had the highest payout for Large
- Small companies pay grows step-wise with respect to the remote ratio (Hybrid->In-Person->Remote)
- A lot of information, but the most interesting is that the US has the highest paying jobs by far with Small companies in Japan as a close second.
- This has been a look into the data science job market examining salary as it relates to company size, the companies remote ratios, and the actual experience levels required for the positions. We will now be moving into more India based Software Engineering Visuals for Part 2.